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\ Abstract 

f*^ , General relativity is a deterministic theory with non-fixed causal struc- 

QO ■ ture. Quantum theory is a probabilistic theory with fixed causal struc- 

<^^> ture. In this paper we build a framework for probabilistic theories with 

non-fixed causal structure. This combines the radical elements of general 
relativity and quantum theory. We adopt an operational methodology 
. for the purposes of theory construction (though without committing to 

^J^' operationalism as a fundamental philosophy). The key idea in the con- 

struction is physical compression. A physical theory relates quantities. 
(3JT), Thus, if we specify a sufficiently large set of quantities (this is the com- 

pressed set), we can calculate all the others. We apply three levels of 
physical compression. First, we apply it locally to quantities (actually 
, probabilities) that might be measured in a particular region of spacetime. 

5^ ■ Then we consider composite regions. We find that there is a second level 

of physical compression for the composite region over and above the first 
level physical compression for the component regions. Each application 
of first and second level physical compression is quantified by a matrix. 
We find that these matrices themselves are related by the physical the- 
ory and can therefore be subject to compression. This is the third level 
of physical compression. This third level of physical compression gives 
rise to a new mathematical object which we call the causaloid. From the 
causaloid for a particular physical theory we can calculate everything the 
physical theory can calculate. This approach allows us to set up a frame- 
work for calculating probabilistic correlations in data without imposing a 
fixed causal structure (such as a background time) . We show how to put 
quantum theory in this framework (thus providing a new formulation of 
this theory). We indicate how general relativity might be put into this 
framework and how the framework might be used to construct a theory 
of quantum gravity. 
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1 Preliminary remarks 



The great outstanding problem in theoretical physics left over from the last cen- 
tury is to find a theory of quantum gravity. A theory of quantum gravity (QG) 
is a theory which approximates quantum theory (QT) and general relativity 
(GR) in appropriate limits (including, at least, situations where those theories 
have already been experimentally verified). The problem is to go from two the- 
ories which are less fundamental to one which is more fundamental. Of course, 
it is possible, at least logically, that a theory of quantum gravity can be entirely 
formulated inside one of these two component theories. The main approaches 
to QG assume that the quantum framework is sufficient. Indeed, it is often 
stated that the problem is to quantize general relativity. In string theory (and 
its various descendants) an action is written down which defines the motion of 
strings (or membranes) on a fixed spacetime background 1 . This is formulated 
entirely within the quantum framework. In loop quantum gravity Einstein's 
field equations are written in cannonical form (so we have a state across space 
evolving with respect to some time parameter) and then quantization methods 
are applied [2]. In this paper we will not assume that QG can be formulated 
entirely within the standard quantum framework. Rather we will take a more 
evenhanded approach. We note that both GR and QT have conservative and 
radical features. 

General Relativity 

Conservative feature: General relativity is deterministic. Given suffi- 
cient information on a boundary, there is a unique solution for the 
physical observables in the theory. 

Radical feature: The causal structure is non-fixed. Whether a partic- 
ular interval 5x^ is spacelike or timelike is not specified in advance 
but can only be determined once we have solved the Einstein field 
equations for the metric. 

Quantum Theory 

Conservative feature: The causal structure is fixed in advance. We will 
elaborate on this in Sec. |21below. 

Radical feature: The theory is irreducibly probabilistic. That is to say, 
we cannot state the postulates of standard QT without reference to 
probabilities. 

It is curious that each theory is radical where the other is conservative. It is 
likely that QG must be radical in both cases. Thus, we take as our task to find 
a framework for physical theories which 

1. Is probabilistic. 

2. Admits non-fixed causal structure. 
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If we are able to find such a framework then we can hope to formulate both QT 
and GR as special cases. And, more importantly, we can expect that QG will 
also live in this framework. 

To begin we need a starting point. Fortunately, if we look back to the 
historical conceptual foundations of GR and modern QT we see that they have 
in common a certain operationalism. In his 1916 review paper "The foundation 
of the general theory of relativity" Einstein motivates the crucial requirement of 
general covariance in various ways by appealing to operational reasoning. For 
example, he says 

All our space-time verifications invariably amount to a determina- 
tion of space-time coincidences. (...) Moreover, the results of our 
measurings are nothing but verifications of such meetings of the 
material points of our measuring instruments with other material 
points, coincidences between the hands of a clock and points on the 
clock dial, and observed point-events happening at the same place 
and the same time . 

The introduction of a system of reference serves no other purpose 
than to facilitate the description of the totality of such coincidences 

m 

(and hence, since these coordinates are merely abstract labels, the laws of 
physics must be invariant under general coordinate transformations). The first 
sentence of Heisenberg's 1925 paper "Quantum-theoretical re-interpretation of 
kinematic and mechanical relations" , which marked the birth of modern quan- 
tum theory, reads 

The present paper seeks to establish a basis for theoretical quantum 
mechanics founded exclusively upon relationships between quantities 
which in principle are observable @|. 

Heisenberg was, of course, very much influenced by the operationalism of Ein- 
stein. Given this common starting point for the two theories, it makes sense 
to adopt it here also. Thus, we will adopt an operational methodology. Be- 
fore proceeding, it is important to qualify this. We are adopting an operational 
methodology for the purposes of theory construction. This does not commit 
us to operationalism as a fundamental philosophy (in which the reality of any- 
thing beyond the operational realm is denied). Operationalism is a potentially 
powerful methodology precisely because it can remain neutral about what is 
happening beyond the operational realm and consequently enable us to make 
statements about a physical situation we know, at least, are not wrong. 

We will try to be particularly careful to formulate a version of operationalism 
that is useful for our purposes. The key aspect of the operational realm is that 
it is possible to accumulate data recording the settings of the instruments and 
the outcomes of measurements. Hence, our starting point will be the following 
assertion 
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Assertion: A physical theory, whatever else it does, must correlate recorded 
data. 

Of course, a physical theory may do much more than correlate data - it may 
provide an explanation of what happens, it may provide a picture of reality, it 
may provide a unified description of diverse physical situations. However, in 
order that a physical theory be considered as such, it must be capable of cor- 
relating data. Once again, it is important to assert that this does not commit 
us to an operational philosophy of physics. Nevertheless, the fact that physical 
theories must be capable of correlating data places constraints on the mathe- 
matical structures that can serve as such theories. We will look at how a theory 
can correlate data and find a general mathematical framework for physical the- 
ories. Operationalism can be regarded as a kind of conceptual scaffolding used 
to construct this mathematical framework. Once we have found this framework 
we are free, should we wish, to disregard the scaffolding and regard the math- 
ematical framework as a fundamental description of the world. Something like 
this happened when we went from Einstein's operationally formulated version 
of special relativity to Minkowski's picture. 

In both GR and QT there is a matter of fact as to whether a particular 
interval is timelike or not (in GR we can only establish this after solving for 
the metric). In QG we expect the causal structure to be non-fixed as in GR. 
However, in standard quantum theory, there is no matter of fact as to the value 
of any non- fixed physical quantity unless it is measured (or specially prepared). 
Hence, in QG we expect that there will be no matter of fact as to whether 
a particular interval is timelike or not unless a measurement is performed to 
determine it. This means that we cannot assume that there is some slicing of 
spacetime into a time ordered sequence of spacelike hypersurfaces. Many of the 
concepts we usually take for granted in physics, such as evolution, state at a 
given time, prediction, and preparation have to be re-examined in the light of 
these considerations. 

The formalism presented in this paper first appeared in [5]. The present 
paper is almost self contained though a few proofs which do not appear here are 
in®. 

2 Exploration of causal structure in QT 

In this section we will elaborate, as promised, on the nature of the fixed causal 
structure in QT. The most immediate manifestation of this is that the state in 
quantum theory is given by 

\m) = u(t)\m) a) 

We see that there is a background time t which assumes that there is a certain 
fixed causal structure (past influences future) acting in the background. How- 
ever, a deeper insight into causal structure in QT is gained by thinking about 
the relationships between operators that pertain to distinct spacetime regions. 
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If these two spacetime regions are spacelike separated then the operators should 
commute. In this picture we are thinking of operators which act on the global 
Hilbert space. An alternative way of thinking is to imagine a local Hilbert space 
corresponding to each spacetime region. To be more specific consider two spa- 
tially separated quantum systems with Hilbert spaces Hi and H.2 of dimension 
Ni and N 2 respectively. Let system 1 be acted upon by a quantum gate A. Let 
system 2 be acted upon sequentially by three gates B, C, and D where gate 
B is spacelike separated from gate A. Denote the quantum operators associ- 
ated with the evolution due to each gate by A, B, C, and D (these operators 
pertain to the local Hilbert space of the corresponding system). Gates A and 
B are spacelike separated. Hence the appropriate way to combine the opera- 
tors A and B is to use the tensor product giving A <g> B. (As an aside note 
that the property that the global operators should commute follows if we write 
a = A ® I and b = I ® B for the global operators where / is the identity.) 
Gates B and C are timelike separated and immediately sequential. Therefore 
the appropriate way to combine the operators B and C is by the direct product 
(composition) CB. Gates B and D are timelike separated but not immediately 
sequential. The right way to combine operators B and D is to use what we will 
call the question mark product \D1B] . The question mark product is defined by 
[D1B]C = DCB. It is clearly a linear operator. We see that we have here three 
different products. To choose the correct one we need to know, in advance, what 
the causal relation is between the two regions. We can only do this if we specify 
a particular causal structure in advance and hence this causal structure must be 
fixed. We will find a new product - the causaloid product - which unifies these 
three types of product treating them in the same way in the context of a more 
general framework. This will enables us to formulate a framework in which, in 
general, we do not need to specify in advance whether a particular separation is 
timelike or spacelike (and, indeed, there may be no matter of fact as to whether 
the separation is timelike or spacelike). 

To gain a clue as to where this framework will come from consider the above 
example further. If we are given A®B then we can deduce A and B separately. 
Likewise if we are given [DIB] we can deduce B and D separately. This second 
case is not so obvious - physically what is happening is that it is possible to 
break any tight correlation between B and D by considering different possible 
C's. In these two cases, all the information available in the operators before 
they are combined remains available afterwards. However, if we are given the 
operator CB we cannot deduce C and B separately. The best way to under- 
stand the reason for this is that we can deduce the state for region CB from 
measurements on region B alone (since it is the same qubit which passes, in 
sequence, through these two regions). Consequently, there is a reduction in the 
number of parameters required to specify the state for this composite region 
(unlike in the case of region AB). This is reflected by a reduction in the num- 
ber of parameters required to represent operators CB in the dual space. The 
reduction in the number of parameters required to specify the operator is due 
to correlations between the two regions coming from the physical theory itself 
(quantum theory in this case). There is a certain kind of physical compression. 
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This is also the only case when there is a direct causal connection between the 
two regions - within the context of this quantum circuit there is nothing that 
can be done to break the correlation between the two regions. We will say that 
the two regions are causally adjacent. Hence we see that causal adjacency is 
associated with a certain kind of physical compression (which we will call second 
level physical compression). It turns out that physical compression is the key 
- it is the mathematical signature of causal structure. Physical compression 
arises since the physical theory relates quantities and consequently we can have 
full information about all quantities by listing a subset (the remaining quanti- 
ties being deduced from this subset using relations deduced from the physical 
theory). We will use the notion of physical compression to formulate a general 
framework for probabilistic theories which do not require fixed causal structure. 

3 Collection and analysis of data 

In experiments we collect data. Data consists of (i) a record of actions taken 
(such as knob settings) and (ii) results of measurements and observations (for 
example observing that a detector clicks, or observing the reading of a clock). 
Typically, we will take note when data that is recorded in close proximity. For 
example, we might note that, at time 02:52 according to a clock A which is 
proximate to the Stern-Gerlach apparatus B which was set at angle 55°, the 
detector corresponding to spin up clicked. Here we have three pieces of data 
(02:52, 55°, and spin up) all recorded in proximity. We will assume that such 
proximate data is recorded on a card (one card for each set of proximate data) . 
Thus, at the end of an experiment, a stack of cards will be accumulated where, 
on each card, proximate pieces of data are written as in this example. Of course, 
it is not necessary that cards are actually used - the data could be stored in 
a computers memory, in a lab book, or in the brain of the experimentalist. 
However, the story with the cards will help us set up the framework we are after. 
The notion of proximity is clearly a slightly vague one. On this matter, Einstein 
writes "We assume the possibility of verifying ... for immediate proximity 
or coincidence in space-time without giving a definition of this fundamental 
concept." It will ultimately boil down to a matter of convention and judgement 
as to what data counts as proximate. The convention aspect is under our control. 
Typically no two events will be exactly coincident so we will have to set a scale. 
If the two events occur to within this scale then we will say that they are 
proximate. The choice of scale is a convention. So long as we stick with a 
consistent convention then there is no problem. However, there is still a matter 
of experimental judgement in asking whether two events are proximate to within 
this scale. The judgement aspect is not so much under our control. 

We will assume that the first piece of data, x, on each card corresponds 
to some observation which we will regard as specifying location. We will have 
in mind that this corresponds to space-time location (although it is not strictly 
necessary that this is the case) . For example, x could be the space-time location 
read off some actual physical space-time reference frame. It could be the GPS 
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position given by the retarded times of four clocks situated on four appropriately 
moving satellites. The remaining data on a card is a record of actions (e.g. knob 
settings) and observations. The choice of actions is allowed to depend on x. A 
simple example is where the data stored on the card is of the form (x, F(x),s) 
where x is the location data, F(x) represents the choice of actions such as 
knob settings (this depends on a;) and s represents outcomes of measurements 
(for example spin measurements). In the case that there are multiple knobs 
F is a multivariable object, and if there are multiple measurement outcomes 
obtained at this location then s is, likewise, a multivariable object. We could 
consider more complicated examples such as (x, r, F(x, r), s) where r is data 
that is not regarded as part of the data representing location but on which the 
choice of action can, nevertheless, depend. We will illustrate these ideas with 
two examples 

Probes drifting in space. We imagine a number of probes (n = 1,2,...) 
drifting in space. Each probe has a clock with reading t n , some knobs 
which control the settings, F(n, x), of various measurements (such as Stern 
Gerlach orientations) and some meters with readings s n . At each tick of 
the clock on each probe we record on a separate card 

(x = (t n , {i™ }), n, F(n, x),s n ) 

where represents the retarded times seen at probe n on the other 

probes (we could choose just a subset of the other probes here). At the 
end of the experiment, we will end up with one card for each tick of each 
probe. 

Sequence of spin measurements. Imagine a sequence of five spin measure- 
ments performed on a single spin half particle emitted from a source. We 
label the spin apparatuses x = 1 to x = 5 and the source x — 0. At the 
source we collect a card with data x = followed by whatever data is 
recorded corresponding to the proper functioning of the source. At each 
spin measurement we collect the data (x, 0(x), s) where 9 is the orientation 
of the spin measurement and s is the outcome (spin up or spin down) . At 
the end of the experiment we will have a stack of six cards. 

There are many different possible choices for the function F (corresponding to 
the various possible choices of knob settings at different locations). We will 
imagine that the experiment is repeated for each possible function. Further, 
since we are interested in constructing a probabilistic theory, we will assume 
that the experiment can be repeated many times for each F so that we can 
construct relative frequencies. We will imagine that each time the experiment 
is performed the cards are bundled into a stack and tagged with a description 
of F. After having repeated the experiment many times for each F we will 
have a large collection of tagged bundled stacks of cards. To usefully repeat the 
experiment it may be necessary to reset some aspects of the setup such as the 
clocks. The notion of repeating the experiment is problematic assumption if we 
are in a cosmological setting. An alternative approach is discussed in 
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We will imagine that this collection of tagged bundled stacks of cards is sent 
to a man inside a sealed room for analysis. Our task is to invent a method by 
which the man in the sealed room can analyse the cards thereby developing a 
theory for correlating data. The order in which the cards are bundled into any 
particular stack does not, in itself, represent recorded data (all recorded data 
is written on the cards themselves). Consequently, the man in the sealed room 
should not take this into account in his analysis. To be sure of this we can 
imagine that the cards in each stack are shuffled before being bundled. The 
order of the stacks also does not represent data and so we can also imagine that 
the bundles themselves are also shuffled before being sent into the sealed room. 

The usefulness of this story with a man inside a scaled room is that he cannot 
look outside the room for extra clues on how to analyse the data. Hence, he 
will necessarily be proceeding in accordance with an operational methodology 
as we discussed earlier. He will have to define all his concepts in terms of the 
cards themselves. We will now define some concepts in terms of the cards. 

The full pack, denoted by V, is the set of all logically possible cards over all 
x, all possible settings and all possible outcomes - any card that can be 
collected in the experiment must belong to V. 

An elementary region, denoted by R x , is the set of all cards taken from V 
which have some particular x written on them. 

A stack, denoted by Y, is the set of cards collected one repetition of the ex- 
periment. 

A procedure, denoted by F, is the set all cards taken from V which are con- 
sistent with the given function F for the settings. We intentionally use the 
same notation for the set and for the function since it will be clear from 
the context which meaning is implied and, in any case, the information 
conveyed is the same (the set F is a more cumbersome way of conveying 
this information but it will turn out to be useful below). 

It is worth noting that we must have Y C F C V for a stack Y tagged with 
procedure F. We can define some more concepts in terms of these basic concepts. 

A region denoted by Ro^ is equal to the union of all the elementary regions 
R x for which x S 0\. That is 

Roi = |J Rx (2) 

ieOi 

We will often abbreviate Ro 1 by R\. 

The procedure in region i?i is given by the set 

F Rl ee F n J2i (3) 

We will sometimes write this as F\ . It conveys the choice of measurement 
settings in region R\ (more accurately, it conveys the intended choice of 
measurement settings). 
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The outcome set in region Ri is given by 

Y Rl ee Y n i?! (4) 

We will sometimes write this as Y\. It represents the outcomes seen in 
this region. 

Note that 

Y 1 CF 1 C R 1 (5) 

These definitions may appear a little abstract. However, the idea is very simple. 
We regard the cards as belonging to regions, for example R\. In this region we 
have 

(Yi,Fi) ^==^ (outcomes in Ri, settings in i?i) (6) 

We will label each possible (Yi, Fi) in Ri with a\ = 1, 2, By analysing the 

cards in terms of which regions they belong to the man in the sealed room can 
form a picture of what happened during the experiment. 

We are seeking to find a probabilistic theory which correlates data. It is 
worth thinking carefully about what this means. Probabilities must be condi- 
tional. Thus, we can talk about the probability of A given that condition B is 
satisfied. But even further, the conditioning must be sufficient for the probabil- 
ity to be well defined. For example, we can calculate the probability of a photon 
being detected in the horizontal output of a polarising beamsplitter given that, 
just prior to impinging on this beamsplitter, it passed through a polariser ori- 
entated at 45° to the horizontal. This probability is well defined (and equal to 
i). However, the probability that the photon will be detected in the horizontal 
output of a polarising beamsplitter given that, just prior to impinging on this 
beamsplitter, it passed through a plane sheet of glass is not well defined. We 
would need more information to be able to calculate this probability. The lesson 
to be drawn from this is that it is not always possible to calculate probabilities. 
Thus, we will take as the task of the theory the following 

1. To be able to say whether a probability is well defined. 

2. If the probability is well defined to be able to calculate it. 

The first task is important and deserves further discussion. One way to think of 
this is to adopt an adversary model. Thus, imagine that we were to write down 
a certain probability for a photon being detected in the horizontal output of a 
polarising beamsplitter given that it had just passed through a plane sheet of 
glass. Whatever probability we write down, we could imagine some adversary 
who can ensure that this probability is wrong. For example, before the photon 
impinges on the sheet of glass, the adversary may send the photon through 
a polariser set at some angle he chooses such that our probability is wrong. 
However, when we have sufficient conditioning an adversary cannot do this. 
This is clear in the first example where the photon passes through a polariser 
set at 45° just prior to impinging on the polarising beamsplitter. How do we 
usually know whether a probability is well defined in physical theories? A little 
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reflection will reveal that we usually know this by reference to some underlying 
definite causal structure. For example, if we have a full specification of the 
boundary conditions in the past light cone of some region and we know what 
settings are chosen subsequent to these boundary conditions in this past light 
cone , then we can make well defined predictions for the probabilities in that 
region. However, in the case that we do not have some well defined causal 
structure to refer to, we cannot proceed in this way. In the causaloid framework 
to be presented we will provide a more general way to answer the question of 
whether a probability is well defined. 

In the notation above, we wish first to know whether the probability 

Prob(y 2 |yi,fi,.F 2 ) (7) 

is (1) well defined and, if so, (2) what this probability is equal to, for all (Yi, Fi) 
and (Y2, F2), for all pairs of regions R\ and i?2- We will now develop a framework 
which can do this. 

4 Three levels of physical compression 

4.1 Preliminaries 

Consider the probability 

Prob(T|F) (8) 

This is the probability that we see some stack Y given procedure F. It is unlikely 
that this probability is well defined since it is conditioned only on choices of 
knob settings and not on any actual outcomes. Thus, instead, we consider the 
probabilities 

Prob(Y R \F R ,C V - R ) (9) 

where R is a large region (one containing a substantial fraction of the cards in 
V), Yr and Fr are the outcome set and procedure, respectively, in R. And 
Cv-r is some condition on Y fl (V — R) and F fl (V — R) (i.e. some condition 
on what is seen and what is done in region V — R). We will assume that 
the probabilities Prob(YR\F R , Cv-r) are well defined for all Y R and Fr. We 
will restrict our attention to the case where condition Cv-r is true and then 
we will only consider what happens in region R. We might think of Cv-r 
as corresponding to the conditions that go into setting up and maintaining a 
laboratory (for example, setting up the lasers, ensuring that the blinds are kept 
down, etc.). Since we are always taking Cv-r to be true we will drop it from 
our notation writing Pto^YrIFr) . This way of setting up the framework is 
not ideally suited to the cosmological context (where there is not any external 
condition like Cv-r)- Ways round this are discussed in 

4.2 First level physical compression 

We will develop this framework by employing three levels of physical compres- 
sion. The first level of physical compression pertains to a single region R\ (inside 
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R of course). We can write 



Prob{Y R \F R ) = Prob(Y Rl U Y R ^ Rl \F Rl UF R _ Rl ] 



(10) 



We will think of {Y R - Rl , F R - Rl ), which happens in R — R\ as a generalised 
preparation for what happens in region R\ (we call it a generalised preparation 
since it is not, in general, restricted to the past of R\ - rather it pertains to the 
past, the future, and to elsewhere in so much as these words have meaning in the 
absence of definite causal structure) . Further, we will think of each (Y Rl , F Rl ) as 
corresponding to some (measurement outcome, measurement choice) in region 
i?i - we label them with a±. Thus, we have a\ O {Y^KF^ 1 ). We can now 
write the above probability as 



Pai = Prob(Y£ U Y R _ Rl \F% U F R _ Rl ) 



(11) 



We will now define the state in region R\ associated with a generalised prepara- 
tion in R — Ri to be that thing represented by any mathematical object which 
can be used to calculate Pai for all a±. Given this definition one mathematical 
object which clearly suffices to represent the state is 



P(i?i 



( ) 

P ai 

V : / 



(12) 



We can write 



Pai =R Ql CRi)-P(i?i) 



(13) 

where R Ql (i?i) is a vector which has a 1 in position oi\ and 0's everywhere else. 
Now, in general, a physical theory will correlate these probabilities. This means 
that they will be related to each other. Hence, we should be able to specify 
the state by giving a shorter list of probabilities (than in P) from which all the 
other probabilities can be calculated. This provides some physical compression 
(compression due to the physical theory itself). In fact we can choose to stick 
with linear physical compression. Thus, we write the state as a just sufficient 
set of probabilities 

( \ 

h € fii (14) 



p(Ri) 



Ph 



V = / 

where there exist vectors r Ql (Ri) such that a general probability is given by the 
linear equation 

p ai = r Ql (-/?i) • p(Ri) (15) 

Clearly this is possible since, as a last resort, we have (|13|l . Since the proba- 
bilities in p(i?i) are just sufficient for this purpose, there must exist a set of 
| Oi| linearly independent states chosen from the allowed set of states. We will 
call Hi the fiducial set in region R%. The choice of fiducial set for a region is 
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unlikely to be unique. This does not matter. We can choose one set and stick 
with it. We have just employed linear physical compression here. It is possible 
that if we employed more general mathematical physical compression (allowing 
non-linear functions) we could do better. This does not really matter since we 
are free to choose linear physical compression as the preferred form of physical 
compression. In fact, it can easily be proven that if we are able to form mix- 
tures of states (as we can in quantum theory) then we cannot do better than 
linear physical compression (this is not surprising since probabilities combine 
in a linear way when we form mixtures). It is worth noting that in first level 
physical compression we implement the label change 

ai — > h (16) 

as we go from the set of all ai's to the fiducial set The exact form of the 
first level physical compression is encoded in the vectors r Ql (since if we know 
these vectors we can undo the physical compression). We define the matrix 

AL^C (17) 

where rf 1 are the components of r Ql . The matrix A*^ tells us how to undo 
the first level physical compression. This matrix is likely to be very rectangular 
(rather than square). 

4.3 Second level physical compression 

Now we come to second level physical compression. This applies to two or more 
disjoint regions and corresponds to the physical compression that happens over 
and above the first level compression for the composite regions. Consider just 
two disjoint regions for the moment, R\ and R 2 . 

Paia2 = Prob^ 1 U Y%1 U Y R _ Rl _ R2 \F^ U F% U F r _ Ri _r 2 ) (18) 

where ct\ and «2 label measurement plus outcomes in regions R\ and R2 respec- 
tively. Now we can reason as before. The state for region R\ U R2 is given by 
any mathematical object which can be used to calculate all p ai a 2 - Employing 
first level linear physical compression as before we can write the state as 



p(R 1 UR 2 ) = 



( ) 

Pk x k 2 



hk 2 G fii2 (19) 



where 

p aia2 =r aia2 (R 1 UR 2 ) -p(Ri UR 2 ) (20) 
We will now prove that there always exists a choice of fiducial set ili 2 such that 

fii2 C!) lX !l 2 (21) 



12 



where x represents the cartesian product (e.g. {1, 2} x {5, 6} = {15, 16, 25, 26}). 
This result is central to the method employed in this paper. Second level physical 
compression is nontrivial when O12 is a proper subset of Hi x fl 2 - To prove l|21|l 
note that we can write p ai a 2 as 

prober U Y«l U Yr-k x -r, \Fl\ U F£ U F R _ Rl _ R2 ) 

= r ai (Ri) ■ p a2 (Ri) 

= E r^(Ri)Pt;(Rx) 

= E COw 2 ( 22 ) 

where p a2 (i?i) is the state in Ri given the generalised preparation (Y^ U 
5^-iii-fl 2 ,^ 2 2 u FR-R1-R2) in region J? - R x , and p ;i (i? 2 ) is the state in R 2 
given the generalised preparation (Y^ U Y R - Ri -r 2 , Ufji-^-jiJ in region 
R — R 2 and where 

\F 1 r\UF 1 * 2 UF r 

-R1-R2) ( 23 ) 

Now we note from <|22[) that p aiQ2 is given by a linear sum over the probabilities 
Pi x i 2 where l\l 2 G fii x SI2. It may even be the case that we do not need all of 
these probabilities. Hence, it follows that CI12 C 17 1 x ^2 as required. 

We will now explain second level physical compression. This is the physical 
compression that happens for a composite regions over and above first level 
physical compression for the component regions. From 12UI22|I we have 

Pa ia2 = r ai a 2 {Ri U R 2 ) ■ p{Ri U R 2 ) 

Em qo 



J2 r h r h r hh-p(RiUR 2 ) 



hh 

Since we can find a spanning set of linearly independent states p(Ri U R 2 ), we 
must have 

r aia2 (i?x U i? 2 ) = E r h r h r hh(Ri U R2) (24) 
hh 

We define 

where rjj. 1 ^ is the fci&2 component of r; 1 ; 2 . Hence, 

ilia 
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This equation tells us that if we know A^ 2 then we can calculate r QlCt2 (R1UR2) 
for the composite region R\ U R2 from the corresponding vectors r , {R\) and 
r Q2 (i?2) for the component regions R\ and i?2- Hence the matrix A^ 2 encodes 
the second level physical compression (the physical compression over and above 
the first level physical compression of the component regions) . We can use it to 
define the causaloid product 

r aia2 (RxUR 2 ) = r ai (i?i)® A r Q2 (ii2) (27) 

where the components are given by 126|) . The causaloid product generalises and 
unifies the various products for quantum theory discussed in Section [2] (though 
in the context of a more general framework - we will show in Section [3] how 
quantum theory fits into this framework). 

We can implement second level physical compression for more than two 
regions by applying the same reasoning. Thus, for multi-region physical com- 
pression, we implement 

hh---ln — >k\hz...k n (28) 
in going from fli X f2a X • • • X £l n to Qi2... n where the matrix 

Afc£" ( 29 ) 
encodes the second level physical compression. 



4.4 Third level physical compression 

Finally, we come to third level physical compression. We can consider all regions 
to be composite regions made from elementary regions R x , R x > , R x », etc. Then 
we generate the following set of A matrices. 

/ for all xgOr \ 

\l"l x ' for all x,x' £ Or 
kk.k,, , „ ( 3 °) 



V : / 

where Or is the set of x in region R. Given these A matrices we can calculate 
the r vectors for any measurement outcome for any region using the causaloid 
product. Now, just as the probabilities are related to one another by the physical 
theory (thus enabling first and second level physical compression), we might 
expect that these A matrices are related to one another enabling us to calculate 
all of them from a smaller set. Hence, we expect to be able to enact a third 
level of physical compression where the object 

A = (subset of A's|RULES) (31) 
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enables us to calculate an arbitrary lambda matrix from the given subset (where 
RULES are a set or rules for doing this). Such third level physical compression is, 
indeed, possible. In Sec. we will show how it is enacted in quantum theory. We 
will call A the causaloid (because it contains information about the propensities 
for different causal structures). This is the central mathematical object in this 
paper. For any particular physical theory the causaloid is fixed (this is modulo 
certain qualifications concerning what might be regarded as boundary conditions 
that come from the conditioning Cy-R, though these issues will, most likely, 
go away once we are in a cosmological setting 5 ). In fact, once we know the 
causaloid we can perform any calculation possible in the physical theory (see 
Sec. I4.5fl . Consequently, the causaloid can be regarded as a specification of a 
physical theory itself. 

The third level physical compression is accomplished by using identities re- 
lating A matrices. We can use these to calculate higher order A matrices (having 
more indices and corresponding to larger regions) from lower order ones when 
certain conditions on the f2 sets are satisfied. We will state some identities of 
this form without proof. First, when fl sets multiply so do A matrices. 

. k x —k a ik x ,t—k x it, _ . k m —k x , . fc x //---fc a ./// -r q ,„ — O / y O // ,// 

I // H---1 in — I •••I i I it-"l in L x---x' x" ■■■x'" — ^ L x---x' ^ ^ L x" ■■■x'" 

"'; " ' (32) 

Second, there exists a family of identities from which A matrices for composite 
regions can be calculated from some pairwise matrices (given certain conditions 
on the sets). The first of this family is 

A W2? 3 fc3= Yl A h^ A hh if ^i23-^i2xfi ?3 and fl 23 = ft 23 ; x f! ?3 (33) 

where the notation fl^ 3 means that we form the set of all fc 3 for which there 
exists /c2^3 G ^23- The second in this family of identities is 

, ^1234 = ^12 X ^;23 X Qpi 

A hhhU* = E A ^ 2A K 3 <t if ^23 = ^X^3 

(34) 

and so on. These identities are elementary to prove (see jS])- 



4.5 Using the causaloid to calculate correlations 

Once we have the causaloid, we can use it to calculate any r Ql (i?i ) for any a\ and 
for any region (whether composite or elementary) by using the causaloid product 
(using A 1 * from first level physical compression to get the components of the 
^a^iRx) vectors for the elementary regions to get us started). The causaloid 
can be used to calculate conditional probabilities as we require of the formalism. 
Note 

p = Prob(F 1 l \Y 2 2 ,F 1 \F 2 2 ) = - (35) 

l^Bn r a!(3 2 l-Kl U K 2 ) ■ P(Hl U H2) 
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where /?2 runs over all outcomes for the measurement associated with ai (recall 
that a2 labels a particular outcome of a particular measurement). Therefore 



1. p is well defined iff 



r, 



(Ri UR2) is parallel to ^ r Ql( g 2 (R± U R2) 

02 



(36) 



because this is the only way for the probability to be independent of the 
state p(i?i U-R2) (as it must since the state is associated with a generalised 
preparation outside R\ U R2) since there exists a linearly independent 
spanning set of such states. 

2. If p is well defined then it is given by 



(i.e. equal to the ratio of the lengths of the vectors). 

This works for any pair of regions. Hence, if we know the causaloid we can 
calculate whether any probability is well defined and we can calculate its value 
if it is - this is the task we set ourselves at the end of Sec. |3J 

5 Formulating quantum theory in the causaloid 
framework 

We will show that the theory for an arbitrary number of pairwise interacting 
qubits can be formulated within this framework. Universal quantum computa- 
tion can be carried out with such a system and so we will regard this as being 
general enough for our purposes. First, consider a single quantum system (which 
may be a qubit) acted up on by a sequence of transformations/measurements 
labelled by t = 1, 2, . . . , T. We can visualise this as a sequence of boxes where 
each box has a knob for setting, F(t), of the particular measurement being im- 
plemented and some meters which record the outcome St of the measurement. 
We record (f, F(t), St) on a card for each t. In quantum theory such a measure- 
ment/transformation is associated with a set of completely positive trace non- 
increasing linear maps (or superoperators) {$(t,F(t),s t )} such that J2 St §(t,F{t),s t ) 
(the sum is over all outcomes associated with a given measurement choice and 
a given t) is trace preserving. In our previous notation, t plays the role of x, 
the elementary regions are Rt (equal to the set of all cards that can have t on 
them), and (Y t ,F t ) corresponds to (outcome, setting) in R t . Further, we label 
each possible (Y t ,F t ) by at in accordance with our previous notation. Superop- 
erators act on the input state to produce an output state 




(37) 



P (t + l) = $ at (p(t)) 



(38) 
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Two important examples of superoperators are the unitary map p — ► UpW 
(which preserves the trace) and the projection map p — ► PpP (which decreases 
the trace in general). In general, the probability of seeing the sequence of 
outcomes s\, S2, . . . St, given some procedure F(t), is given by 

prob(r T , Yr_!, . . . Y 1 \F T , F T -i, ■ . . F u p(0)) 

— trace[$ QT o $ olT _ 1 o • • ■ o $ ai (p(0))] (39) 

Now let us consider one elementary region R t . We will write the probability 
in as 

p at = trace[$ T o • • • o $ at o • • • o $ x (p(0))] (40) 

where we have suppressed a's from our notation except at the crucial time 
t. Now note that, since superoperaters are linear, we can expand a general 
superoperator in terms of a linearly independent fiducial set. We will label the 
fiducial set by l t S f2 t (we have |f2t| = A^ 4 where N is the dimension of the 
Hilbert space for the system under consideration). Thus, we can write 

K = T, r u*ii ( 41 ) 

where $; t is the fiducial set (this is not a unique choice). Putting this into l (30j l 
gives 

Pm = r at • p (42) 

where we are using our previous notation. The A matrices for the elementary 
regions are then given by A^ t = r ; " f obtained by solving the set of linear equa- 
tions l|41|l . This accomplishes first level physical compression for the elementary 
regions Rt for a single quantum system going from label at to label It- 
Now we will write the probability in (|39|) as 

Pa t ,a t = trace[$ T o • • • o $ Q( , o • • • o $ at o • • • o $i(p(0))] (43) 

where we have suppressed a's from our notation except at times t and t' > t. If 
t' = t + 1 then these two times are immediately sequential. For a reason that 
will soon become apparent, we will choose the first member of each fiducial set 
of superoperators to be equal to the identity map so we have $i = I (where / 
is the identity map). Then we can write 

$cv°$ at =5>ir**Jo$ fct C 44 ) 
h 

since the composition of two superoperators using o is a map on p and lives in 
the same space as a single superoperator and so we can expand the composition 
in terms of a fiducial set of linearly independent superoperators at one time. 
This means that 

flft = {1} x n t if t' = t+l (45) 
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and we see that we have non-trivial physical compression. The A matrices for 
this second level physical compression of pairs of sequential elementary regions 
are given by 

^l t =r[t (46) 

by solving l|44|) . The same technique works when we have any number of im- 
mediately sequential regions. For three immediately sequential regions we have 

£Wt = {1} x {1} x Q t if t" = t' + 1 = t + 2 (47) 

and so on. 

In the case that we have non-sequential times t and t' there is no physical 
compression and 

Qt't = fit' X fit if t' > t + 1 (48) 

Proof of this requires careful consideration of the form of l|43|) above. We will 
omit this proof here. However, the physical reason for this is that different 
choices of intervening superoperators break the possibility of any tight corre- 
lations between the two regions and so there is no physical compression. The 
same is true for any two clumps of regions with a gap. 

fit»'...t''t'...t = fit»'...t'' x Q t ,...t if t" >t' + l (49) 

We now come to third level physical compression. We can implement third 
level physical compression by noticing the following. First note that we can 
divide any composite region into a set of regions which we will call "clumps" 
where the regions in each clump are immediately sequential, and where there are 
gaps between the clumps. Now note that l|45(l (and its generalisations, such as 
(|47() to any number of immediately sequential regions) satisfies the conditions 
on ft sets such that identity (and its generalisations such as JHU) hold. 
Hence, for each clump of immediately sequential regions we can calculate the A 
matrix employing this family of identities using just the A matrices for pairs of 
immediately sequential regions. Secondly, we see that satisfies the condition 
for identity i|32|) to hold - so that we can simply multiply the A matrices from 
each clump to get the A matrix we are looking for. We will call this method the 
"clumping method". This means that we can write the causaloid for a single 
system in quantum theory as 

A = (A^ t , h'Ullil RULES = clumping method) (50) 

where r is some particular time t (we only need specify these matrices for one 
r since they will be the same for all other t by symmetry) . 

Now we will consider pairwise interacting qubits. Examples of such pairs 
of interactions are given in Fig. ^ (we will call these causaloid diagrams). Let 
each qubit be labelled by i. The qubits are shown by the thin lines. The nodes 
represent the elementary regions. If two qubits pass through a node then they 
can interact in that elementary region. Nodes are labelled by x. Adjacent 
nodes (between which a qubit passes) are represented by links. If we consider 
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a single qubit i then a sequence of times for this qubit is associated with the 
sequence of labels x along the thin line. We can build up the causaloid for 
this system of interacting qubits by extending the methods above. To do this 
consider a node, x, at which two qubits, labelled by i and j, interact. We can 
act on these two qubits jointly with some measurement/transformation. This 
will be associated with a set of superoperators $ ax . A special subset of these 
superoperators are those that can be written in tensor product form . ® §a x - 
where a x = (a X i,a X j) in these cases. A subset of these are $J . (g> $^ where 
Ixi € £l x i labels a fiducial set of linearly independent superoperators on qubit 
i, and similarly for j. Now, it turns out that this particular set of product 
form superoperators form a complete linearly independent set for the general 
superoperators on the two qubits. That is, we can write 



C J L XZ I 'XJ L X% I- x 



(51) 



This means we can use fiducial measurements for which the qubits effectively de- 
couple. For each qubit we can apply the clumping method to find the causaloid 
for that qubit. Since the qubits effectively decouple for the fiducial measure- 
ments, the f2 sets for composite regions involving more than one qubit will 
factorise between the qubits. Hence, a general A matrix involving more than 
one qubit can be obtained by multiplying the corresponding A matrices for each 
qubit. Then, to couple the qubits, we need only add the full specification of the 
local lambda matrices 

Aif*'^ = r?*_, (52) 



which can be calculated from 1)510. Hence, the causaloid is given by 



A = ( {A£'" J V x}, {Ai*f*'; V adjacent x,x'} 



clumping method 
causaloid diagram 



(53) 



Note, if a node only has one qubit passing through it then we list A l ^ i rather 

than Aa** 3 . There is quite considerable physical compression at the third level. 
If there are M nodes, then we only need list of order M matrices (and these 
are low order matrices having only a small number of indices) even though the 
number of possible A matrices grows exponentially with M. We will, most 
likely, be able to obtain further third level physical compression since symmetry 
considerations will mean that we do not have to list separately the A matrices 
for all x and for all adjacent x, x' . 



6 Ideas on how to formulate General Relativity 
in the Causaloid framework 

General relativity has not yet been put into the causaloid framework. Such a 
formulation of GR would be operational. One idea is to pursue a line of thought 
suggested by Einstein. He says 
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If, for example, events consisted merely in the motion of material 
points, then ultimately nothing would be observable but the meet- 
ings of two or more of these points |3] . 

Thus, the data written onto a card would be a list of particles (assume each of 
these particles is labelled) which are proximate. We would collect many such 
cards forming a stack. The purpose of the physical theory would be to correlate 
the data on these cards - and hence we would expect the causaloid formalism to 
work for this purpose. There are a few problems with this approach. Einstein 
introduces metric notions and it is not clear how this could be recovered merely 
by looking at sets of coincidences. One possible way to solve this problem would 
be to equip each point particle with a clock and record the time of each particle's 
clock on the card also. Another problem is that the causaloid formalism is 
discrete rather than continuous. There are discrete formulations of GR jB], 
but these tend to be in the canonical picture. Nevertheless, we would probably 
be satisfied with a discrete formulation of GR in the causaloid framework - 
especially if it turns out that QG is, itself, a discrete theory since then GR 
would just be the continuous limit of a discrete theory. Unlike GR, the causaloid 
framework has a notion of agency (there are knob settings). However, no agency 
is a special case of agency (where there is only one choice) so this need not be 
a problem. Alternatively, we could try to recover the notion of agency in GR. 
For example, we could consider tiny differences in the matter distribution (such 
as those in the brain) which are below the resolution of our experiment to be 
magnified so they are above the resolution. This could be modelled in GR. 

The theory we really want is what might be called probabilistic GR (ProbGR) . 
This would be to GR what statistical mechanics is to Newtonian mechanics. One 
problem with formulating ProbGR is that normally, when we formulate a sta- 
tistical version of a deterministic theory, we take a mixture of definite states 
accross space at a definite time. However, this would require a 3 + 1 splitting 
against the spirit of GR and certainly against the spirit of QG. However, the 
causaloid framework would be a natural setting for ProbGR without introducing 
any such splitting. 

7 Ideas on how to formulate quantum gravity in 
the causaloid framework 

There are two strategies we might adopt to find a theory of QG in the causaloid 
framework. First, we could formulate both QT and ProbGR in this framework 
and then hope that some way of combining the essential features of the two 
theories presents itself. The "map" that takes us from CProbT to QT could 
be applied to ProbGR to get QG. This approach might work. However, from a 
conceptual point of view it is not necessarily so clean. We are taking two less 
fundamental theories as part of the process by which we obtain a more funda- 
mental one. An alternative approach would be to attempt to derive a theory of 
QG within the causaloid framework from scratch by invoking some deep prin- 
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ciples. For example, we might attempt to formulate the equivalence principle 
in a sufficiently general way that it applies to the causaloid framework. This is 
clearly a much more difficult route to get started on. In practice, some combina- 
tion of these two approaches is most likely to be successful. It is likely that, by 
having the two less fundamental theories formulated in the same framework, we 
will be in a better position to extract principles from which QG can be derived. 

8 Conclusions 

A theory of QG is likely to have features that neither GR or QT have. For 
example, in GR and QT there is a definite matter of fact as to whether an 
interval is timelike or not (in QT this is specified in advance whereas in GR we 
know this only after solving the equations). The strategy we have adopted to 
work towards the construction of QG is to construct a framework, the causaloid 
formalism, which is likely to be general enough to contain QG as a special 
case. This is essential since if we work in a framework that cannot, in principle, 
contain QG then we have no chance of formulating QG in the given framework. 
The causaloid formalism does contain QT and it is likely to contain GR. 

The formulation of QT in this framework uses a notion of "generalised prepa- 
ration" . An example of this is pre- and post-selection in the framework of 
Aharanov, Bergmann, and Lebowitz (ABL) 

In QG it is likely that we will lose the notion of an external time unaffected 
by what happens in the experiment. This is likely to imply that we cannot 
have unitary evolution. More accurately, it is likely to imply that the theory 
which results when we take that limiting case of QG that approximates QT 
will not quite have unitary evolution. This might be consistent with collapse 
models (such as those of Ghirardi, Rimmini and Weber UJ, and Pearle |10|)- 
The possibility of a connection between gravity and non-unitary evolution does, 
of course, have a long history (see in particular JT] , ^2] , and for a different take 
see 53)- However, the situation might actually be more subtle. It is possible 
that, unlike in collapse models, the theory will remain time-symmetric (in so 
much as such a notion makes sense in the absence of fixed causal structure) 
just as the formulation of ABL is time symmetric. Collapse models employ the 
notion of an evolving state at a fundamental level whilst such a notion is unlikely 
to be fundamental in QG. But since the measurement problem is a fundamental 
problem, we would like its solution to be implicit in the fundamental formulation 
of QG rather than just in the limiting case of QT. This raises deep questions 
concerning whether collapse is the right way to solve the measurement problem. 

Dedication 

It is a great honour to dedicate this paper to Giancarlo Ghirardi. One lesson 
implicit in his work on collapse models, and particularly taken to heart here, 
is that we should think of modifying quantum theory in a hope to go beyond 
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our present theories. Only then can we hope for experimental discrimination 
between theories. 
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